Knowledge tracing (KT) aims to leverage students' learning histories to estimate their mastery levels on a set of pre-defined skills, based on which the corresponding future performance can be accurately predicted. In practice, a student's learning history comprises answers to sets of massed questions, each known as a session, rather than merely being a sequence of independent answers. Theoretically, within and across these sessions, students' learning dynamics can be very different. Therefore, how to effectively model the dynamics of students' knowledge states within and across the sessions is crucial for handling the KT problem. Most existing KT models treat student's learning records as a single continuing sequence, without capturing the sessional shift of students' knowledge state. To address the above issue, we propose a novel hierarchical transformer model, named HiTSKT, comprises an interaction(-level) encoder to capture the knowledge a student acquires within a session, and a session(-level) encoder to summarise acquired knowledge across the past sessions. To predict an interaction in the current session, a knowledge retriever integrates the summarised past-session knowledge with the previous interactions' information into proper knowledge representations. These representations are then used to compute the student's current knowledge state. Additionally, to model the student's long-term forgetting behaviour across the sessions, a power-law-decay attention mechanism is designed and deployed in the session encoder, allowing it to emphasize more on the recent sessions. Extensive experiments on three public datasets demonstrate that HiTSKT achieves new state-of-the-art performance on all the datasets compared with six state-of-the-art KT models.
translated by 谷歌翻译
Deep neural networks (DNNs) recently emerged as a promising tool for analyzing and solving complex differential equations arising in science and engineering applications. Alternative to traditional numerical schemes, learning-based solvers utilize the representation power of DNNs to approximate the input-output relations in an automated manner. However, the lack of physics-in-the-loop often makes it difficult to construct a neural network solver that simultaneously achieves high accuracy, low computational burden, and interpretability. In this work, focusing on a class of evolutionary PDEs characterized by having decomposable operators, we show that the classical ``operator splitting'' numerical scheme of solving these equations can be exploited to design neural network architectures. This gives rise to a learning-based PDE solver, which we name Deep Operator-Splitting Network (DOSnet). Such non-black-box network design is constructed from the physical rules and operators governing the underlying dynamics contains learnable parameters, and is thus more flexible than the standard operator splitting scheme. Once trained, it enables the fast solution of the same type of PDEs. To validate the special structure inside DOSnet, we take the linear PDEs as the benchmark and give the mathematical explanation for the weight behavior. Furthermore, to demonstrate the advantages of our new AI-enhanced PDE solver, we train and validate it on several types of operator-decomposable differential equations. We also apply DOSnet to nonlinear Schr\"odinger equations (NLSE) which have important applications in the signal processing for modern optical fiber transmission systems, and experimental results show that our model has better accuracy and lower computational complexity than numerical schemes and the baseline DNNs.
translated by 谷歌翻译
Neural network pruning has been a well-established compression technique to enable deep learning models on resource-constrained devices. The pruned model is usually specialized to meet specific hardware platforms and training tasks (defined as deployment scenarios). However, existing pruning approaches rely heavily on training data to trade off model size, efficiency, and accuracy, which becomes ineffective for federated learning (FL) over distributed and confidential datasets. Moreover, the memory- and compute-intensive pruning process of most existing approaches cannot be handled by most FL devices with resource limitations. In this paper, we develop FedTiny, a novel distributed pruning framework for FL, to obtain specialized tiny models for memory- and computing-constrained participating devices with confidential local data. To alleviate biased pruning due to unseen heterogeneous data over devices, FedTiny introduces an adaptive batch normalization (BN) selection module to adaptively obtain an initially pruned model to fit deployment scenarios. Besides, to further improve the initial pruning, FedTiny develops a lightweight progressive pruning module for local finer pruning under tight memory and computational budgets, where the pruning policy for each layer is gradually determined rather than evaluating the overall deep model structure. Extensive experimental results demonstrate the effectiveness of FedTiny, which outperforms state-of-the-art baseline approaches, especially when compressing deep models to extremely sparse tiny models.
translated by 谷歌翻译
Recently, discrete latent variable models have received a surge of interest in both Natural Language Processing (NLP) and Computer Vision (CV), attributed to their comparable performance to the continuous counterparts in representation learning, while being more interpretable in their predictions. In this paper, we develop a topic-informed discrete latent variable model for semantic textual similarity, which learns a shared latent space for sentence-pair representation via vector quantization. Compared with previous models limited to local semantic contexts, our model can explore richer semantic information via topic modeling. We further boost the performance of semantic similarity by injecting the quantized representation into a transformer-based language model with a well-designed semantic-driven attention mechanism. We demonstrate, through extensive experiments across various English language datasets, that our model is able to surpass several strong neural baselines in semantic textual similarity tasks.
translated by 谷歌翻译
模型推理的成本效率对于现实世界机器学习(ML)应用至关重要,尤其是对于延迟敏感的任务和资源有限的设备。一个典型的困境是:为了提供复杂的智能服务(例如智能城市),我们需要多种ML模型的推理结果,但是成本预算(例如GPU内存)不足以运行所有这些结果。在这项工作中,我们研究了黑盒ML模型之间的基本关系,并提出了一项新的学习任务:模型链接,该任务旨在通过学习映射(配音模型链接)之间的输出空间之间的学习映射(配音模型链接)来弥合不同的黑盒模型的知识。我们提出了模型链接的设计,该链接支持链接异质的黑盒ML模型。同样,为了解决分布差异挑战,我们提出了模型链接的适应和聚合方法。根据我们提出的模型链接,我们开发了一种名为MLINK的调度算法。通过通过模型链接启用的协作多模型推断,麦克林可以提高成本预算下获得的推理结果的准确性。我们在具有七个不同的ML型号和两个现实世界的视频分析系统和3,264小时的视频上评估了多模式数据集上的麦克林。实验结果表明,我们提出的模型链接可以在各种黑盒模型之间有效构建。在GPU内存的预算下,MLINK可以节省66.7%的推理计算,同时保留94%的推理准确性,这表现优于多任务学习,基于强化的基于强化的计划调度程序和框架过滤基线。
translated by 谷歌翻译
以移动为中心的AI应用程序对模型推断的资源效率有很高的要求。输入过滤是消除冗余以降低推理成本的有前途的方法。以前的努力已经针对许多应用程序量身定制了有效解决方案,但是尚未解决两个基本问题:(1)推理工作量的理论滤波器可指导输入过滤技术的应用,从而避免了资源受限的移动应用程序的试用成本; (2)功能嵌入的可辨别性可允许输入过滤对各种推理任务和输入内容有效。为了回答它们,我们首先将输入过滤问题正式化,理论上比较了推理模型和输入过滤器的假设复杂性,以了解优化潜力。然后,我们提出了第一个端到端可学习的输入过滤框架,该框架涵盖了大多数最先进的方法,并以可强大的可区分性嵌入功能。我们设计和实施支持六种输入方式和多个以移动为中心的部署的INFI。综合评估证实了我们的理论结果,并表明INFI在适用性,准确性和效率方面的表现优于强大的基准。 INFI获得8.5倍的吞吐量并节省95%的带宽,同时保持超过90%的精度,以用于移动平台上的视频分析应用程序。
translated by 谷歌翻译
实体对齐(EA)的目的是在不同的知识图(kgs)中找到指代现实世界中同一对象的实体。最近的研究结合了时间信息,以增强KGS的表示。暂时KGS(TKG)之间的EA的现有方法利用时间感知的注意机制将关系和时间信息纳入实体嵌入中。该方法通过使用时间信息优于先前的方法。但是,我们认为,由于大多数TKG具有统一的时间表示,因此不必学习kgs中的时间信息的嵌入。因此,我们提出了一个简单的图形神经网络(GNN)模型,并结合了时间信息匹配机制,该模型以更少的时间和更少的参数实现了更好的性能。此外,由于对齐种子很难在现实世界应用中标记,因此我们还提出了一种通过TKG的时间信息生成无监督比对种子的方法。公共数据集的广泛实验表明,我们的监督方法显着优于先前的方法,而无监督的方法具有竞争性能。
translated by 谷歌翻译
我们提出了一种新颖的方法来重新定位或放置识别,这是许多机器人技术,自动化和AR应用中要解决的基本问题。我们不依靠通常不稳定的外观信息,而是考虑以局部对象形式给出参考图的情况。我们的本地化框架依赖于3D语义对象检测,然后与地图中的对象关联。可能的配对关联集是基于评估空间兼容性的合并度量的层次聚类而生长的。后者特别使用有关​​相对对象配置的信息,该信息相对于全局转换是不变的。随着相机逐步探索环境并检测更多对象,关联集将进行更新和扩展。我们在几种具有挑战性的情况下测试我们的算法,包括动态场景,大型视图变化以及具有重复实例的场景。我们的实验表明,我们的方法在鲁棒性和准确性方面都优于先前的艺术。
translated by 谷歌翻译
鉴于机器学习环境快速变化和昂贵的数据标记,当来自源域的标记数据与目标域的部分标记的数据在统计上不同时,必须进行半监督域的适应(SSDA)。大多数先前的SSDA研究都在集中进行,需要访问源和目标数据。但是,如今许多字段中的数据是由分布式终端设备生成的。由于隐私问题,数据可能是本地存储的,无法共享,从而导致现有SSDA研究的无效性。本文提出了一种创新的方法,以通过联合半监督域适应(FSSDA)命名的多个分布式和机密数据集实现SSDA。 FSSDA基于战略设计的知识蒸馏技术将SSDA与联合学习集成在一起,通过并行执行源和目标培训来提高效率。此外,FSSDA通过正确选择关键参数(即模仿参数)来控制跨域传输的知识量。此外,建议的FSSDA可以有效地推广到多源域适应方案。进行了广泛的实验,以证明FSSDA设计的有效性和效率。
translated by 谷歌翻译
神经网络修剪一直是减少对资源受限设备的深度神经网络的计算和记忆要求的重要技术。大多数现有的研究主要侧重于平衡修剪神经网络的稀疏性和准确性,通过策略性地删除无关紧要的参数并重新修剪修剪模型。由于记忆的增加而造成了严重的隐私风险,因此尚未调查这种训练样品的这种努力。在本文中,我们对神经网络修剪中的隐私风险进行了首次分析。具体而言,我们研究了神经网络修剪对培训数据隐私的影响,即成员推理攻击。我们首先探讨了神经网络修剪对预测差异的影响,在该预测差异中,修剪过程不成比例地影响了修剪的模型对成员和非会员的行为。同时,差异的影响甚至以细粒度的方式在不同类别之间有所不同。通过这种分歧,我们提出了对修剪的神经网络的自我发起会员推断攻击。进行了广泛的实验,以严格评估不同修剪方法,稀疏水平和对手知识的隐私影响。拟议的攻击表明,与现有的八次成员推理攻击相比,对修剪模型的攻击性能更高。此外,我们提出了一种新的防御机制,通过基于KL-Divergence距离来缓解预测差异,以保护修剪过程,该距离的预测差异已通过实验证明,可以有效地降低隐私风险,同时维持较修剪模型的稀疏性和准确性。
translated by 谷歌翻译